Hamming-like distances for ill-defined strings in linguistic classification

نویسندگان

  • Luca Bortolussi
  • Andrea Sgarro
چکیده

Ill-defined strings often occur in soft sciences, e.g. in linguistics or in biology. In this paper we consider `-length strings which have in each position one of the three symbols 0 or false, 1 or true, [ or irrelevant. We tackle some generalisations of the usual Hamming distance between binary crisp strings which were recently used in computational linguistics. We comment on their metric properties, since these should guide the selection of the clustering algorithm to be used for language classification. The concluding section is devoted to future work, and the string approach, as currently pursued, is compared to alternative approaches. ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Capacity of Bidirectional Associative Memory

The capacity of Bidirectional associative memory (BAM) was examined a lot in research, but not completely. In particular, this issue was not investigated in the context of strings coding. In this paper we apply different approaches to estimate the capacity of BAM for strings coding. One of these approaches is recalling of all coded strings. Another is applying Hamming and Levenshtein distances ...

متن کامل

Computational dialectology in Irish Gaelic

Dialect groupings can be discovered objectively and automatically by cluster analysis of phonetic transcriptions such as those found in a linguistic atlas. The first step in the analysis, the computation of linguistic distance.between each pair of sites, can be computed as Levenshtein distance between phonetic strings. This correlates closely with the much more laborious technique of determinin...

متن کامل

Capacity Inverse Minimum Cost Flow Problem under the Weighted Hamming Distances

Given an instance of the minimum cost flow problem, a version of the corresponding inverse problem, called the capacity inverse problem, is to modify the upper and lower bounds on arc flows as little as possible so that a given feasible flow becomes optimal to the modified minimum cost flow problem. The modifications can be measured by different distances. In this article, we consider the capac...

متن کامل

Efficient Algorithms for Some Variants of the Farthest String Problem

The farthest string problem (FARTHEST STRING) is one of the core problems in the field of consensus word analysis and several biological problems such as discovering potential drugs, universal primers, or unbiased consensus sequences. Given k strings of the same length L and a nonnegative integer d, FARTHEST STRING is to find a string s such that none of the given strings has a Hamming distance...

متن کامل

Approximate Regular Expression Matching

We extend the de nition of Hamming and Levenshtein distance between two strings used in approximate string matching so that these two distances can be used also in approximate regular expression matching. Next, the methods of construction of nondeterministic nite automata for approximate regular expression matching considering both mentioned distances are presented.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007